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The highest fidehty of quantum error-correcting codes of length n and rate R is 
proven to be lower bounded by 1 — exp[— n£'(i?) + o(n)] for some function E{R) 
on noisy quantum channels that are subject to not necessarily independent errors. 
The E{R) is positive below some threshold Rq, which implies Rq is a lower bound 
on the quantum capacity. This work is an extension of the author's previous works 



[M. Hamada, Phys. Rev. A, 65, 052305 (2002), e-Print |quant-ph/0109114| , LANL 



2001, and M. Hamada, e-Print |quant-ph/0112103| , LANL, 2001], which presented 



the bound for channels subject to independent errors, or channels modeled as tensor 
products of copies of a completely positive linear map. The relation of the channel 
class treated in this paper to those in the previous works are similar to that of 
Markov chains to sequences of independent identically distributed random variables. 
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I. INTRODUCTION 

Quantum error-correcting codes (simply called quantum codes or codes in this work) were 
discovered by Shori and Steanei as schemes that protect quantum states from decoherence 
during quantum computation. Shor§ not only gave the first quantum code but also posed 
a problem of determining the quantum analog of Shannon's channel capacity. In classical 
information theory, channels with independent errors are callecL memoryless channels and 
channels with correlated errors are called channels with memoryS, which will be applied to 
quantum channels as well in the present work. On au£m memoryless channels, several 
bounds on the quantum capacity have been knownl^'OafllZl, and also exponential convergence 
of fidelity of codes was recently proved by the present authorS'El. It is natural to ask whether 
such bounds and exponential convergence hold true or not on channels with memory, which 
will be answered affirmatively in this work. 

While one of the greatest incentives to investigate quantum codes is need in quantum 
computing, we are not sure which devices to use for this purpose currently. Hence, we do 
not know which channel models are appropriate, so that treating general channels may be 
among what we can proceed to now. Thus, this paper analyzes the code performance on a 
class of quantum channels that is much wider than was treated in the literature. 

In the proof of the main result below, the method of types, which is a powerful tool from 
classical information theory, plays an important role00. This method was exploited by 
the Hungarian mathematician (information theorist) Csiszar and coworkers around 1980 to 
present the strongest coding theorems such as the one showing the existence of universal 
channel codes asymptotically as good as any codes0'0. It has often produced results in 
elementary enumerative manners, which is also the case in this paper. 



II. MAIN RESULT FOR SIMPLE CASE 

As usual, all quantum channels and decoding (state-recovery) operations in coding 
systems are described in terms of trace-preserving completely positive (TPCP) linear 
map Given a Hilbert space H of finite dimension, let L(H) denote the set of 

linear operators on H. In general, every CP linear map Ai : L(H) — > LfH) has an operator- 
sum representation M{p) = J2iex ^^P^i some G L(H), i G X.Q'0'Eilll When M is 
specified by a set of operators {Mjjigi in this way, we write A4 ~ {Mjjjgj. 

Hereafter, H denotes an arbitrarily fixed Hilbert space of dimension d, which is a prime 
number. A quantum channel is a sequence of TPCP linear maps {An '■ L(H'^") — L(H®")}. 
We want a large subspace C„ C H®" every state vector in which remains almost unchanged 
after the effect of a channel followed by some suitable recovery operation : L(H®'^) — »• 
L(H®"). A pair (Cn,7?.„) consisting of such a subspace C„ and a TPCRmap TZn is called a 
code and its performance is evaluated in terms of minimum fidelityoQ'Ej 

F(C„,7^„X) = min (V^|7^„A(|^)(V'I)I^), 

where TZn-An denotes the composition of An and Tin- Throughout, bras (-1 and kets |-) are 
assumed normalized. A subspace C„ alone is also called a code assuming implicitly some 
recovery operator. 

Let F*i^{An) denote the supremum of F{Cn, TZnAn) such that there exists a code (C„, Tin) 
with log^dimCn > k, where n is a positive integer and /c is a nonnegative real number. Our 
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goal is to estimate as precisely as possible. 

First, we state the main result for an easy case, and give a more general statement later. 
Fix an orthonormal basis {|0), . . . , |(i — 1)} of H. Put X = {0, . . . ,d — 1}^ and iV(jj) = X^Z^ 
for e X. Here, X, Z G L(H) are Weyl's unitaries, which could be viewed as generalized 
Pauli operators, and are defined by 

X|j) = |(j-l)modrf), Z\j)=cu^\j), (1) 

where is a primitive d-th root of unitylii'll30Si^. From the L(H) basis {N(^ij)}, we obtain 
a basis N„ = {N^ \ x e X''} of L(H®''), where = N-,^ ® . . . ® N^^ ioi x = {xi, . . . , x„) G 
X^. The first channel class to be considered here consists of those {An} such that An ~ 
{^Pjx)N,} 

xex"-, where we assume that P„ are the probability distributions of a (first- 
order) homogeneous Markov chain, i.e., that P„ has the form 

n-l 

P„(xi, . . . = p{xi) ]^P(a;j+i|a;j) (2) 

i=i 

with some transition probabilities P(f u,v E X, and some initial distribution p. These 
are generalizations of the so-called depolarizing channel!'!; see Ruskai et al^ for a thorough 
analysis of memoryless channels with d = 2. 

Given a probability distribution Q on X^, we let Q and Q denote the two marginal 
distributions: 

Q{u) = '^Q{u,v), Q{u) = '^Q{v,u), ueX. 

vex vex 

The classical (conditional) Kullback-Leibler inforamtion (informational divergence or rela- 
tive entropy) is denoted by D and entropy by Specifically, for a probability distri- 
bution Q on X^, transition (or conditional) probabilities P{v\u), u,v E X, and a probability 
distribution p on X, we define Q (-I-) by Q {v\u) = Q{u,v)/Q{u) for Q{u) > 0, D{Q\\P) by 



and H{P\p) by 



P(v\u) 



H{P\p) = - $^P(w)^(^|w)log,P(t;|u), 

ueX: p{u)>QveX 



which is called the entropy of P(-|-) conditional on p. We remark that D{Q\\P) is a con- 
ditional Kullback-Leibler information, so that in a more consistent notationE2l, it would be 
denoted by dCq\\P\Q). 

By convention, we assume log(a/0) = oo for a > 0, OlogO = 01og(0/0) = 0. The first 
form of this work's main result is the next one. 

Theorem 1 Let a channel An ~ {^/ Pn{x)Nx}xeX" , n = 1,2, . . . , be specified by with 
some P(-|-) and p. Then, for < R < 1, we have 

hminf -- log,[l - P;^„(A)] > EiR, P), (3) 
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where 

E{R,P) = minjD(QllP) + |1 - hCq\Q) - R\+], 

Q:Q=Q 

\x\'^ = max{x,0}, and the minimization with respect to Q is taken over all probability 
distributions on with Q = Q. 

Remarks. Roughly speaking, the theorem says F* ^^^{An) ~ 1 — exp^[— ni?(i?, P)]. An 
immediate consequence of the theorem is that when the Markov chain is irreducible, the 
quantum capacitytrafflQ of {An} is lower bounded by 1 — H{P\q), where ^ is the unique 
stationary (steady state, or equilibrium) distribution of the Markov chained. To see this, 
observe that E{R, P) is positive for R <1 — H{P\q) due to an easily established inequality 
D{Q\\P) > where equality occurs if and only if Q{u,v) = q{u)P{v\u) for all u,v & X 

under the constraint Q = Q. 

Example. Let us assume d = 2, rename the elements (0, 0), (1, 0), (0, 1), (1, 1) in X as 
0)1)2,3, and define P{v\u), u,v E X, by 

if M = and v = 0, 
if M = and v ^ 0, 
if u 7^ and v = 0, 
if M 7^ and f 7^ 0. 

In this case, {A^} is analogous to the channel with memory discussed by Gilbert0 in the 
context of classical channel coding (see also GallagerEi, Sec. 4.6). If we brought Gilbert's idea 
into our quantum case innocently, we might assume < e < 7 < 1 and interpret as 'good 
state,' 1,2,3 as 'bad ones,' where a state means that of the Markov chain, not a quantum 
state, and e (resp., 7) as the probability of going into a 'bad state' provided the current 
state be 'good (resp., bad).' For the above quantum channel, the lower bound 1 — H{P\q) 
becomes 

^ (1-7) [hjs) + e log, 3] + £[/i(7) + 7 log2 3] 
1-7 + ^ 

where h is the binary entropy function h{z) = —z log, z — {l — z) log2(l — z). Note that when 
e = 7, the channel becomes the depolarizing channel and the lower bound on the capacity 
becomes the known oneBI. 

III. PROOF OF THEOREM 1 

A. Codes Based on Symplectic Geometry 

The codes to be Droven to have the desired performance are symplectic (stabilizer, or 
additive) coc?eslH§E§E3. Let us recall first the basics of symplectic codes. We can regard 
the index of Ni^i^^ = X''Z\ {i,j) G A", as a pair of elements from the field F = F^^ = 
Z/dZ, the finite field consisting of d elements. Recall we put A^^ = N^-^ ® . . . ® N^^ for 
) G (F^)". We write Nj for {A^^ G N„ | x G J} where J C (F^)". The index 



P(v\u) 



1-e 
e/3 
1-7 
1 7/3 
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((■Ui, f i), . . . , {un, Vn)) G (F^)" of a basis element can be regarded as the plain 2n-dimensional 
vector 



X = . . .,Un,Vn) G F^" 



We can equip the vector space F^" over F with a symplectic bilinear form (symplectic paring), 
which is defined by 



{x, y)sp = UiV- - Viu'i 

i=l 

for the above x and y = {u[,v[, . . . , u'^, v'^) G F^"'.ii'§i'0 Given a subspace L C F^", let 

L^ = {xGF2"|VyGL, (x,y),p = 0}. 

Lemma 1 El'EI Let a subspace L C F^" satisfy 

L C L"*" anc? dimL = n — k. 

In addition, let Jq C F^" be a set satisfying 

\/x,y e Jo, [y - X e ^ X = y]. (4) 

Then, there exist -dimensional Nj^- correcting codes. 

In fact, given a subspace L as above, there are d'' subspaces of the form 

{i) G H®'^ I VM G A^i, Mt/- = r(M)V'}, 

with some scalars r(M) (eigenvalues of M G A'^^:), and each of them, together with a suitable 
recovery operator, serves as an A^j^-correcting quantum code of dimension d^. Note that the 
direct sum of these subspaces is the whole space H®". The precise meaning of A^j^-correcting 
can be found, e.g., in Knill and LafiammeEj. Originally, Lemma |I] was claimed for the case 
where d = 2, and has been generalized to the case where d is a general prime@lli0il. 

By definition, for an A^j^-correcting code (C„,7^„) and the channel {An} in the theorem, 
it holds 

l-F{C)<J2Pn{x), (5) 

x^Jo 

where F{C) = F{C, TZnAn)- We remark that, as is usually done in the literature, it is assumed 
in this paper that when we speak of an iVj^-correcting code (C„, the TZn indicates the 
one constructed by Knill and LafiammeuJ. Note that TZn is determined from Jq and C. The 
premise @) of Lemma [|, is restated as that Jq is a set of representatives of cosets of L-*- 
in F^"-. A natural choice for Jq would be a set consisting of representatives each of which 
maximizes the probability in its coset0 since it is analogous to maximum likelihood 

decoding, which is an optimum strategy for classical coding (see Slepianll or any textbook 
of information theory). In the proof below, we choose another set of representatives, the 
classical counterpart of which (minimum entrppvdecoding) asymptotically yields the same 



performance as maximum likelihood decodinj 
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B. The Method of Types 



The t 



:heorem can be proved along the lines of Ref. ^, which employed the method of 

type^Sii. In the present case, second-order (Markov) types rather than the usual types 
are used. Needed technical tools from the method of types in the Markov case can be found 
in Csiszar et a/.0 and papers cited therein. We collect here a few basic facts on this method 
to be used below. 

For X = (xi, . . . , Xn) € A"", 72 > 1, define a probability distribution M^. on X'^ by 

I N |{i I 1 < i <n-l,{xi,Xi+i) = {u,v)}\ 

n — 1 

which is called the second-order type or Markov type of x. With X and an element u & X 
fixed, the set of all possible Markov types of sequences (xi,...,x„) from ^Y" satisfying 
Xi = u is denoted by Qn{X,u) or simply by Qn{u), and Qn stands for IJmga' Qn{u). For 
a type Q G Qn{u), Tq{u) is defined as {(xi, . . . ,x„) G X"^ \ Xi = u and = Q}, and Tq 
denotes [j^^-^r^{u). 
In what follows, we use 

\T^{u)\<e^^,[{n-l)HCQ\Q)l u(iX. (6) 

Note that if x = (xi,...,x„) G X"^ with x\ = u has type Q, then Pn(a^) = 
W^a^x^ P(6|a)("-i)W) =p(u)exp^{-(n- l)[i/(^^) +D(Q||P)]} and hence, (§) is 
equivalent to the latter inequality in (39) of Csiszar et a/.lH3, i.e., 

Pr{Mx = g I Xi = < exp,{-(n - l)D(g||P)}, (7) 

where the sequence of random variables X = (Xi, . . . , X„) represents the Markov chain in 
the theorem, i.e., Pr{Xi = Xi, . . . ,X„ = x„} = Pn(a;i, . . . ,x„) with P„ defined in (^. Eq. 
(D or (|^) is a consequence of Whittle's formula for |7q(m)|, a simple proof of which was 
given by BillingsleyB. The upper bound in (^) can be proved even easier with a simple way 
of enumeration (Davisson et als^ or the paragraph containing (9) of Ref. p6|). 



C. Proof of Theorem ^ 

The case where P = 1 is trivial, so that we assume P < 1 from now on. Putting 
k = [Pn], we apply Lemma |l], where we choose Jo as follows. Assume dimL = n — k. 
Then, dimL"*- = n + For notational simplicity, we write Hc{Q) in place of H{Q \Q) 

for a probability distribution Q on X^. From each of the d"'~^ cosets of in F^", select a 
vector that minimizes Hc{Mx), i.e., a vector x satisfying Hc{Mrc) < Hc{My) for any y in the 
coset. This selection uses theidea of the minimum entropy decoder known in the classical 
information theory literatureE3. 

Let Jo{L) denote the set of the ci"'"'^ selected vectors, let 

A = {L C F^" I L linear, L C L^, dimL = n - k} 

and for each L G A, let C{L) be an A^jg(L) -correcting code existence of which is ensured by 
Lemma |I[ Putting 
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we will show liminf„ — ra"^ logd(l ~F) > E{R, P), which implies that, at least, one sequence 
of codes has fidelity as high as promised in the theorem. Such a method for a proof is 
referred to as random codingl3tZl. 

As in the proof of Theorem 1 of Ref. H, we have 

i-F<j:pj.m, (8) 

where 

B(x) = {LeA\x^ ML)}, X G F'". 
The fraction |B(a;)|/|A| is trivially bounded as 

i^<l, xgF^-. (9) 



We use the next inequalityi. Let 

A(a;) = {L G A I X G \ {0}}. 

Then, |A(0)| = and 

|A(x)| dJ'+^-l I , 
|A| d?"^ - I ~ (t-^ ^ ' 

This is a variant of the relation established by Calderbank et. a/@, or its analog proved by 
Matsumoto and Uyematsuil with an explicit use of the Witt lemmailS from the theory of 
bilinear forms. 

Since B(a;) C {L G A | 3?/ G F^'^, H^{Uy) < H^{M^),y -xeL^\ {0}} from the design of 
Jo (I/) specified above (cf. Goppa0), it follows that 

|B(x)| < |A(l/-x)| 

< J2 |A|c/-"+^ (11) 



where we have used {^^) for the latter inequality. Combining (H), (|) and (pA]), we obtain 
the following chain of inequalities with the aid of the basic inequalities in (H) and (0) as well 
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I 

To, I ^ 



as the inequality min{a + 6, 1} < min{a, 1} + min{6, 1} for a, 6 > 0: 
1 -F 

< Pr.{x) mini 1 f 

< E^'(^) E Pr{Mx = Q|Xi=u} mini ^ ' 
u&X QeQn{u) '<Q'eQn:Hc{Q')<Hc{Q) J 

< d'YP^^) E exp,[-(n-l)Z)(Q||P)] exp,[-(n - 1)|1 - i? - Fe(Q')r] 

u£X QeQn Q'eQ„:H^iQ')<H^(Q) 

< Y exprf[-(n-l)Z)(Q||P)]IQ„| max exp,[-(n - 1)|1 - i? - 

< d3 ^ exp^[-(n-l)Z)(Q||P)]|Q„|exp,[-(n-l)|l-ii-i/c(g)|+] 

QeQ„ 

< d3|Q,|2exprf{-(n-l) min[I)(Q||P) + |l-i2-i7c(Q)|+]}. 
Since is polynomial in n, the remaining task is to show that 

liminf min [D{Q\\P) + \1 - R - HciQ)\^] 

is not less than 

min [D{Q\\P) + \1 - R - H,{Q)\+], 

Q:\\Q-Q\\=0 

which is E(R, P). One sees this holds immediately noticing that any Q G Q„ satisfies ||Q — 

Q|| < for the norm . . . , ^i^-i)!! = maxj the set of all probability distributions 

is compact, and D{Q) = D{Q\\P) is continuous in its effective domain {Q \ D{Q) < oo} 
(cf., the proof of Lemma 2 in Csiszar et a/.ii). This completes the proof. 



IV. MAIN RESULT FOR GENERAL CASE 

Theorem |l| actually holds for a wider class of channels. To evaluate the fidelity of codes 
on a more general channel {.4.„}, we first associate a sequence of probability distributions 
{PAu} with the channel {An} as in Ref. ^ 

Definition 1 For each n, let An ~ {^i"''}a;eA'"; expand A^^ as Ai"-* = XlyeA"" o,xyNy, x e 
Af", and define a probability distribution Pa„ on by 

X 

Example. Let {An} be a memoryless channel An = A®"^^ n = 1, 2, . . . . It is easy to see 
that P4„(|/i, ...,yn) = nr=i PA{yi)- n 

The case of memoryless channels as above was discussed in this author's previous worka. 
This work claims the next. 
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Theorem 2 Consider a channel {An} whose {P„ = Pa,,} satisfies with some P(-|-) and 
p. Then, again, for < R < 1, (Qj in Theorem |7| holds. 



The above theorem can be proved along the hnes of this author's previous workH, which 
treated general memoryless quantum channels. Namely, Theorem |^ can be generalized to 
Theorem ^ in the same way as the result in Ref. |^ was strengthened in Ref. ^ Here it is 
briefly described how to prove Theorem]^. First, we evaluate the minimum average fldelity 
-Fa(C), which is another performance measure for a code C introduced in Ref. ^, instead of the 
minimum fldelity F{C). Actually, we evaluate the average of Fg^{C) over the whole ensemble 
of quantum codes {C(L,i) | L G A, < i < c?""'^}, where C{L^i), i = 0, . . . , rf""'^ — 1, are 
the d"'~'' quantum codes associated with L as in Lemma compare the proof of Theorem |l| 
above, where using an arbitrarily chosen code C{L, i) for each L was enough. The average of 
Fa(C(L,'i)) turns out to be lower bounded by 1 — exp^[~nE{R, P) + o{n)]. Then, at least, 
one code C{L, i) has this performance or higher. As proved in Ref. ^ if we have a code with 
1 — Fa{C) < G, we can choose a subcode C of half the dimension with 1 — F{C') < 2G, 
which implies Theorem |[ 

The major difficulty in the analysis on general channels lay in the fact that (§) is no 
longer true in the general case; this was resolved in Ref. ^ by proving that (|]) holds true if 
we replace F{C) = F{C{L)) by F^{C{L,i)) averaged over < z < d"-~''. 

We remark that the result of this paper readily extends to the case where P„ is the 
probability distributions of a higher-order Markov chain. For this extension, we have only 
to use higher-order types instead of second-order types§'i. 



V. CONCLUDING REMARKS 

It should be remarked that the lower bound 1 — if (P|g) on the quantum capacity is not 
tight in general since there is an example of a code which slightly goes beyond the bound for 
some very noisy memoryless channelsa. This work, however, seems the first to demonstrate 
that standard error correction schemes work reliably even in the presence of correlated errors 
with positive information rate for all large enough code lengths. Moreover, the established 
convergence of the fidelity is exponential. Research in this direction is yet to be developed 
in quantum information theory, while exponent problems have already been central issues 
in other fields including large-deviation theoryE^ and classical information theoryc^Ell. 
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